Practical Adversarial Combinatorial Bandit Algorithm via Compression of Decision Sets

نویسندگان

  • Shinsaku Sakaue
  • Masakazu Ishihata
  • Shin-ichi Minato
چکیده

We consider the adversarial combinatorial multi-armed bandit (CMAB) problem, whose decisionset can be exponentially large with respect to the number of given arms. To avoid dealing with suchlarge decision sets directly, we propose an algorithm performed on a zero-suppressed binary decisiondiagram (ZDD), which is a compressed representation of the decision set. The proposed algorithmachieves either O(T ) regret with high probability or O(√T ) expected regret as the any-timeguarantee, where T is the number of past rounds. Typically, our algorithm works efficiently forCMAB problems defined on networks. Experimental results show that our algorithm is applicableto various large adversarial CMAB instances including adaptive routing problems on real-worldnetworks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic and Adversarial Combinatorial Bandits

This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting, we first derive problemspecific regret lower bounds, and analyze how these bounds scale with the dimension of the decision space. We then propose COMBUCB, algorithms that efficiently exploit the combinatorial structure of the problem, and derive finitetime upper bound on thei...

متن کامل

Combinatorial Bandits Revisited

This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting under semi-bandit feedback, we derive a problem-specific regret lower bound, and discuss its scaling with the dimension of the decision space. We propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret....

متن کامل

Online combinatorial optimization with stochastic decision sets and adversarial losses

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable c...

متن کامل

Efficient Algorithms for Adversarial Contextual Learning

We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the chosen action, with the goal of achieving reward competitive with a large class of policies. We analyze two settings: i) in the transductive setting the learner k...

متن کامل

More Adaptive Algorithms for Adversarial Bandits

We develop a novel and generic algorithm for the adversarial multi-armed bandit problem (or more generally the combinatorial semi-bandit problem). When instantiated differently, our algorithm achieves various new data-dependent regret bounds improving previous work. Examples include: 1) a regret bound depending on the variance of only the best arm; 2) a regret bound depending on the first-order...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1707.08300  شماره 

صفحات  -

تاریخ انتشار 2017